This assignment is for ETC5521 Assignment 1 by Team emu comprising of Justin Thomas and Mayunk Bharadwaj.
Using the data provided on the ‘tidytuesday’ platform, our primary question is to identify the characteristics of a winning beach volleyball team for both males and females.
We believe that there might be differences in characteristics for a winning team compared to a losing team because of, for example, prevalence of beach volleyball in certain countries. Also, we theorise that taller and younger players may potentially be better at beach volleyball because of the competitive advantage they may have over shorter and more seasoned players.
Therefore, the secondary questions that will help us answer our primary question are:
In the following report, the reader will be able to find a description and information about the source and limitations of the data; information on how the data was cleaned; an analysis that will answer the above questions and a conclusion.
While going through the dataset, we found that the data was incomplete because there were multiple ‘NA’ values for individual player performance statistics. As such, observations which featured ‘NA’ values had to be removed as they were unlikely to be helpful in our analysis.
Primary Question
What are the characteristics of a winning beach volleyball team for both males and females?
Secondary Questions
This data set provides beach volleyball statistics for men’s and women’s matches at two major tournaments, the Fédération Internationale de Volleyball (FIVB) Beach Volleyball World Championships and the Association of Volleyball Professionals (AVP) tour. The matches are played with teams of 2. In this data set, tournament information, player information, player performance statistics and match results are recorded. The data provided ranges from September 2000 to August 2019 and it has been collected by the data recorded at the tournaments.
The original data source created by Adam Vagner had initial data recorded from September 2000 to July 2017, however it has been periodically updated with the most recent update coming in May 2020. This can be found at https://github.com/BigTimeStats/beach-volleyball.
The structure of the data set is:
There are 65 variables in this data set:
| Variable Name |
|---|
| circuit |
| tournament |
| country |
| year |
| date |
| gender |
| match_num |
| w_player1 |
| w_p1_birthdate |
| w_p1_age |
| w_p1_hgt |
| w_p1_country |
| w_player2 |
| w_p2_birthdate |
| w_p2_age |
| w_p2_hgt |
| w_p2_country |
| w_rank |
| l_player1 |
| l_p1_birthdate |
| l_p1_age |
| l_p1_hgt |
| l_p1_country |
| l_player2 |
| l_p2_birthdate |
| l_p2_age |
| l_p2_hgt |
| l_p2_country |
| l_rank |
| score |
| duration |
| bracket |
| round |
| w_p1_tot_attacks |
| w_p1_tot_kills |
| w_p1_tot_errors |
| w_p1_tot_hitpct |
| w_p1_tot_aces |
| w_p1_tot_serve_errors |
| w_p1_tot_blocks |
| w_p1_tot_digs |
| w_p2_tot_attacks |
| w_p2_tot_kills |
| w_p2_tot_errors |
| w_p2_tot_hitpct |
| w_p2_tot_aces |
| w_p2_tot_serve_errors |
| w_p2_tot_blocks |
| w_p2_tot_digs |
| l_p1_tot_attacks |
| l_p1_tot_kills |
| l_p1_tot_errors |
| l_p1_tot_hitpct |
| l_p1_tot_aces |
| l_p1_tot_serve_errors |
| l_p1_tot_blocks |
| l_p1_tot_digs |
| l_p2_tot_attacks |
| l_p2_tot_kills |
| l_p2_tot_errors |
| l_p2_tot_hitpct |
| l_p2_tot_aces |
| l_p2_tot_serve_errors |
| l_p2_tot_blocks |
| l_p2_tot_digs |
Our data was already in tidy format, so we did not have much cleaning to do. However in order to conduct our analysis, we have tidied the data set by removing variables that are not pertinent to answer our questions.
The methods we have used to tidy our data is as follows:
The reason for why we did not include variables such as match duration, or individual player performance statistics was because it did not fit with answering the questions we have laid out. Additionally, majority of the data for these variables were unknown, so it would not have been useful in our analysis.
| Variable | Description |
|---|---|
| circuit | Either AVP (USA) or FIVB (International) |
| country | Country where tournament played |
| year | Year of tournament |
| date | Date of match |
| gender | Gender of team |
| w_player1 | Winner player 1 Name |
| w_p1_birthdate | Winner player 1 birth date |
| w_p1_age | Winner player 1 age |
| w_p1_hgt | Winner player 1 height in inches |
| w_p1_country | Winner player country |
| w_player2 | Winner player 2 name |
| w_p2_birthdate | Winner player 2 birth date |
| w_p2_age | Winner player 2 age |
| w_p2_hgt | Winner player 2 height in inches |
| w_p2_country | Winner player 2 country |
| l_player1 | Losing player 1 name |
| l_p1_birthdate | Losing player 1 birth date |
| l_p1_age | Losing player 1 age |
| l_p1_hgt | Losing player 1 height in inches |
| l_p1_country | Losing player 1 country |
| l_player2 | Losing player 2 name |
| l_p2_birthdate | Losing player 2 birth date |
| l_p2_age | Losing player 2 age |
| l_p2_hgt | Losing player 2 height in inches |
| l_p2_country | Losing player 2 country |
| score | Match score separated by a dash and matches separated by a comma, eg 21 points to 12 points is 21-12 |
The original data is sourced from: Vagner, A. (2020, July 20). BigTimeStats/beach-volleyball. Retrieved August 22, 2020, from https://github.com/BigTimeStats/beach-volleyball
To load the data set, we had to use a GitHub repository that had the data set. The name of this repository is “Tidy Tuesday”. The data set was sourced from this repository: Mock, J. (2020, May 19). rfordatasciene/tidytuesday. Retrieved August 22, 2020, from https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-05-19/readme.md
For both the AVP and FIVB tournaments, a team consists of 2 players. Each player in the team either comes from the same country or they can come from different countries. Thus, in this section, our analysis focuses on finding the countries that had the most number of winning teams. This will help us find the countries that had the most winning players.
In order to find our answer to this question, we first did some data wrangling to get the data set up for analysis. Then we followed the steps outlined below:
Figure 3.1: Top 20 countries with the most winning teams
Figure 3.1 shows the top 20 countries with the most number of winning teams. We can see that the United States was the most dominating country with a total of 4200 winning teams. This means that 8400 players came from the United States and won. In distant second place, Brazil had 258 winning teams, and so 516 Brazilian players won matches. In a close third place, Germany triumphed with 200 winning teams comprising of 500 players. The remaining teams ranged from having 166 winning teams to 45 winning teams.
The clear winner here is United States and we can conclude that majority of the winning players in the AVP and FIVB tournaments hail from the United States.
We decided to dig further into United States. Although there were 4200 teams where both players in each team came from the United States, there were instances were 1 player came from the United States and another player came from a different country. This following section takes a look at the different countries that partnered with the United States.
In order to find the different countries that partnered with the United States, we followed the steps outlined below:
This gave us a list of all the different country combinations where either player 1 or player 2 came from the United States and the other non-USA player’s country.
| Player 1 country | Player 2 country | Number of teams |
|---|---|---|
| United States | United States | 4200 |
| United States | Brazil | 44 |
| Poland | United States | 34 |
| Canada | United States | 25 |
| Brazil | United States | 24 |
| United States | Canada | 23 |
| United States | Poland | 20 |
| Virgin Islands | United States | 19 |
| United States | Australia | 18 |
| United States | England | 16 |
| United States | Puerto Rico | 15 |
| United States | Virgin Islands | 14 |
| Puerto Rico | United States | 13 |
| Greece | United States | 12 |
| United States | Israel | 11 |
| Italy | United States | 10 |
| United States | France | 10 |
| Philippines | United States | 9 |
| Costa Rica | United States | 8 |
| England | United States | 8 |
Table 3.1 shows 20 different country combinations, which is only a subset of the different countries that partnered with the United States. In total there were 66 different combinations.
Apart from both players coming from the United States, 44 different teams had player 1 come from the United States and player 2 come from Brazil. 34 teams had player 1 come from Poland and player 2 come from the United States.
From looking at the rest of the table, we can see just how popular the United States is as a competing country in volleyball tournaments. It not only registers in tournaments where both players come from the United States, but it also registers where only 1 player in the team comes from the United States and partners with a player from a different country.
N.B. For the method used to complete this analysis, please refer to the commentary included within the code chunks.
The average age for male winning players 1 and 2 are 29.40 and 29.32 respectively. The average age for male losing players 1 and 2, on the other hand, are 29.08 and 28.95 respectively. There is no obvious bias to winning and losing due to age - as the average age for losers and winners is about the same. This might tell us something, however, about the average age of participation in professional male volleyball. If we plot every age of, for instance, male winning player 1 (Figure 3.2) and male losing player 2 (Figure 3.3) as examples, we see that the most commonly occurring ages are in the late 20s (28-29 year of age). Therefore, it is reasonable to infer that male volleyball players - due to the high levels of participation at those ages – hit their peak in their late 20s.
Now, let’s consider women’s volleyball. The average age for female winning players 1 and 2 are 27.98 and 28.29 respectively. The average age for female losing players 1 and 2 are 27.52 and 27.73 respectively. As was the case with the male game, age does not seem to strongly influence winning. However, it is interesting to note that their is a slight difference in average age of winning and losing players between the genders. If we take a look at the average age of winning player 2 in Figure 3.4, we can see that the average age of winning player 2 is less for females than males. Similarly, if we consider the average age of losing player 1 in Figure 3.5, we can see that the average age is also less for females than it is for males.
Figure 3.2: Ages of Male Winning Player 1
Figure 3.3: Ages of Male Losing Player 2
Figure 3.4: Ages of Winning Player 2 by gender
Figure 3.5: Ages of Losing Player 1 by gender
N.B. For the method used to complete this analysis, please refer to the commentary included within the code chunks.
The average height for female winning players 1 and 2 are 70.91 and 70.85 inches respectively. The average height for female losing players 1 and 2 are 70.62 and 70.72 inches respectively. Although the average height for the losing players is less than the height of winning players, it is not a huge difference.
The average height for male winning players 1 and 2 are 76.28 and 76.39 inches respectively, compared to the height for losing players 1 and 2 of 75.98 and 76.15 respectively. Consider Figures 3.6 and 3.7, which display the difference in heights between male winning and losing players 1 (Fig. 3.6) and male winning and losing player 2 (Fig. 3.7). In both situations, the means in difference in height are pretty evenly centred around 0. so we probably can’t say height difference effects winning a volleyball game. We can however say that male volleyball participants are generally taller than female volleyball participants although through common sense we know this phenomenon is not unique to just volleyball.
Figure 3.6: Difference in Heights of Male Player 1
Figure 3.7: Difference in Heights of Male Player 2
After our analysis, we have concluded that a typical winning male volleyball team probably has both players originating from the United States, with player one having an average age of 29.40 and an average height of 76.28 inches with player two having an average age of 29.32 and an average height of 76.39 inches.
In addition, a typical winning female volleyball team probably has both players originating from the United States, with player one having an average age of 27.98 and an average height of 70.91 inches with player two having an average age of 28.29 and an average height of 70.85 inches.
Mock, J. (2020, May 19). rfordatasciene/tidytuesday. Retrieved August 22, 2020, from https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-05-19/readme.md
Sievert, C. (2020). Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida.
Vagner, A. (2020, July 20). BigTimeStats/beach-volleyball. Retrieved August 22, 2020, from https://github.com/BigTimeStats/beach-volleyball
Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686
Zhu, H. (2019). kableExtra: Construct Complex Table with ‘kable’ and Pipe Syntax. R package version 1.1.0. https://CRAN.R-project.org/package=kableExtra